Predoop: Preempting Reduce Task for Job Execution Accelerations
نویسندگان
چکیده
Map/Reduce is a popular parallel processing framework for data intensive computing. For overlapping the Map task’s execution phase and the Reduce task’s intermediate data fetching and merging phase, existing Map/Reduce schedulers always pre-launch the Reduce task at the specific threshold where its map tasks have been launched, and this pattern incurs the occupation of the consuming resources of the reduce task during its idle time on waiting for fetching the intermediate data from map tasks. To address this issue, we propose an extension version of Hadoop map/reduce framework, called Predoop, in this paper. The basic idea of Predoop is to preempt the reduce task during its idle time and allocate the released resource to the map tasks on schedule. To achieve this goal, first, we introduce the preemptive mechanism for reduce tasks and map tasks respectively to enable Map/Reduce tasks to be preempted or resumed with correct status; second, we adopt the preempting-resuming model for the reduce task with the consideration of the progress of Reduce task data fetching & merging and the Map task execution so as to determine the timing of Reduce task preemption and resuming; third, we introduce the preemption-aware task scheduling strategy to allocate the released resources to the on-schedule Map tasks with the consideration of data locality. Experimental result demonstrates that Predoop outperforms Hadoop on various workload and the average job turnaround time can be reduced by maximum of 66.57%.
منابع مشابه
A Cross-Jobs-Cross-Phases Map-Reduce Scheduling Algorithm in Heterogeneous Cloud
To fast process the large-scale data, map-reduce cloud is viewed as a very reasonable and effective platform. According to the new scheduling challenges in map-reduce cloud, a cross-jobs-cross-phases (CJCP) map-reduce scheduling algorithm is proposed in this paper. CJCP mainly consists of four optimal schemes, and respectively deals with four resource waste scenes of the job scheduling process....
متن کاملResource Provisioning based on Preempting Virtual Machines in Resource Sharing Environments
Resource provisioning is one of the main challenges in large-scale resource sharing environments such as federated Grids. Recently, many resource management systems in these environments have started to use the lease abstraction and virtual machines (VMs) for resource provisioning. In resource sharing environments resource providers serve requests from external users along with their own local ...
متن کاملResource provisioning based on preempting virtual machines in distributed systems
Resource provisioning is one of the main challenges in large-scale distributed systems such as federated Grids. Recently, many resource management systems in these environments have started to use the lease abstraction and virtual machines (VMs) for resource provisioning. In the large-scale distributed systems, resource providers serve requests from external users along with their own local use...
متن کاملSolving Task Scheduling Problem in Cloud Computing Environment Using Orthogonal Taguchi-Cat Algorithm
Received Jan 9, 2017 Revised Mar 15, 2017 Accepted Apr 8, 2017 In cloud computing datacenter, task execution delay is no longer accidental. In recent times, a number of artificial intelligence scheduling techniques are proposed and applied to reduce task execution delay. In this study, we proposed an algorithm called Orthogonal Taguchi Based-Cat Swarm Optimization (OTB-CSO) to minimize total ta...
متن کاملReducing Execution Waste in Priority Scheduling: a Hybrid Approach
Guaranteeing quality for differentiated services while ensuring resource efficiency is an important and yet challenging problem in large computing clusters. Priority scheduling is commonly adopted in production systems to minimize the response time of high-priority workload by means of preempting the execution of low-priority workload when faced with limited resources. As a result, the system p...
متن کامل